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Abstract: Individual users are able to buy and sell a broad variety of goods and services worldwide on online auction and 
shopping websites, e.g. eBay.com and Taobao.com. However, attackers have also attempted to conduct fraudulent activities 
against honest parties for the purpose of illegitimate profit. On Internet auction sites, auction fraud mainly involves fraud 
attributable to the non-delivery of products purchased through an Internet auction site or the misrepresentation of a product 
advertised for sale. Malicious sellers may post a non existing item for bidding with false description to deceive the buyer 
concerning its true value, and request payments to be wired directly to them. Similarly, malicious buyers may make a 
purchase through a fraudulent credit card where the address of the card holder does not match the shipping address. Both 
consumers as well as merchants can be victims of online auction fraud, as well as the commercial auction websites. In this 
paper we study the problem of building models for the online auction fraud detection system, which essentially evolves 
dynamically over time. We propose a Bayesian probit online model framework for auction fraud detection. 
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I. Introduction 

Online auction networks, such as eBay.com and taobao.com, have become popular trading platforms, with a large 
variety of products available with competitive prices. Today, these networks have hundreds of billions dollars in trading 
volume, and hundreds of billions dollars in revenue. While online auction networks have many advantages over traditional 
retail stores, many usres are still reluctant to sell/buy products on these networks with the concern that sellers/buyers on 
these networks may not be reliable. To help users assess each other's honesty and integrity, online auction networks often 
use some reputation-based systems. For example, eBay allows the seller and the buyer to leave feedback to each other for 
each transaction and the feedback may be viewed by other users. A seller or buyer with more positive comments can be 
regarded as a more reliable user. 

Fraudsters, however, can collude with accomplices to accumulate bogus positive feedback to manipulate the 
reputation systems, which makes it very hard to evaluate a user's reliability according to the reputation (feedback). It has 
been observed in [1] that the fraudsters and accomplices are likely to form a dense bipartite core as the fraudsters receive 
most of the feedback from the accomplices, and are interested in receiving a large number of feedback comments as quick as 
he/she can. In this paper we study the problem of building models for the online auction fraud detection moderation system, 
which essentially evolves dynamically over time. We propose a Bayesian probit online model framework for the fraud 
detection. We apply the stochastic search variable selection (SSVS) [2], a well known technique in the statistical literature, 
to handle the dynamic evolution of the feature importance in a principled way. 

II. Related work 

In the past, attempts have been made to help usres identify potential fraudsters. However, most of them are 
"common sense" approaches, recommended by a variety of authorities such as newspapers articles [3], law enforcement 
organizations [4], or even from auction sites themselves [5]. These approaches usually suggest that user be cautious at their 
end and perform background checks of sellers that they wish to transact with. Such suggestions however, require peoples to 
maintain constant vigilance and spend a considerable amount of time and effort in investigating potential dealers before 
carrying out a transaction. Reputation systems are used extensively by many auction sites to prevent fraud. But they are 
usually very simple but can be easily foiled. In [6], the authors summarized that modern reputation systems face many 
challenges which include the difficulty to elicit honest feedback and to show faithful representations of users' reputation. In 
[7] and [8], the authors conducted empirical studies which showed that selling prices of goods are positively affected by the 
seller's reputation, implying people feel more confident to buy from trustworthy sources. In summary, reputation systems 
might not be an effective mechanism to prevent online fraud because fraudsters can easily trick these systems to 
manipulating their own reputation. 

In [9], the authors have categorized auction fraud into different types, but they did not formulate methods to combat 
them. They suggest that an effective approach to fight online auction fraud is to allow law enforcement and auction sites to 
join forces, which unfortunately can be costly from both monetary and managerial perspectives. Authority propagation, an 
area closely related to online fraud detection, has been studied extensively in the context of Web search. PageRank [10] and 
HITS [11] treat a Web page as an "important" if other "important" pages point to it. In effect, they propagate the importance 
of web pages over hyperlinks connecting them. Trust propagation was used by TrustRank [12] to detect Web spam. Here, the 
goal was to distinguish between the "good" and "bad" sites (e.g, phishers, sites with adult con- tent, etc). 



www.ijmer.com 



2507 I Page 



International Journal of Modern Engineering Research (IJMER) 
www.iimer.com Vol. 3, Issue. 4, Jul - Aug. 2013 pp-2507-2509 ISSN: 2249-6645 

III. Proposed work 

A. Online Probit Regression 

Consider splitting the continuous time into many equal- sized intervals. For each time interval we may observe 
multiple expert labeled cases indicating whether they are considered as fraud or non-fraud. At time interval "t" suppose there 
are nt observations. Let us denote the i-th binary observation as 'yit". If yit = 1, the case is consider as fraud; otherwise it is 
consider as non-fraud. Let the feature set of case i at time interval t be xit. The probit model can be written as follows: 

P[yu = l\xit,0 t ] = *(a4^ t ) 

where "0 (•)" is the cumulative distribution function of the standard normal distribution N(0, 1), and " J3 1" is the 

unknown regression coefficient vector at time t. 

Through data augmentation, the probit model can be expressed in the hierarchical form as follows: For each 
observation i at time interval t assume a latent random variable zit. The binary response yit can be viewed as the indicator of 
whether zit > 0, i.e. yit = 1 if and only if zit > 0. If zit <= 0, then yit = 0. zit can then be modeled by using a linear regression 

Zit ~ N fen j3 t 7 1) 

In a Bayesian modeling framework it is common practice to put a Gaussian prior on /3 1 as follows: 

B. Coefficient Bounds for Fraud Detection 

It is always important to incorporate domain knowledge into the modeling framework, which can sometimes boost 
the model performance. In our online fraud detection system, the feature set x was proposed by experts with years of 
experience. Currently all the features are in fact binary "rules", i.e. any violation of any one of the rule should somehow 
increase the probability of fraud. However, simply fitting the model might generate a negative coefficient on some of the 
features, because given limited training data, the sample size might be very small for those coefficients to converge to the 
right values, or when some features are highly correlated. Hence we bound the coefficients of those binary "rules" to force 
them to be equal or greater than zero. Specifically, we consider the following optimization problem: 

min w i[Vi l°g( 1 + expf-a^/?)) + 

i 

(l-!fi)lQB(H-expCx;/9)) + HI^|U] 

C. Online Feature Selection through SSVS 

For regression problems with many features, proper shrinkage on the regression coefficients is usually required to 
avoid the case of over fitting. For instance, two common shrinkage methods are LI penalty (Lasso) and L2 penalty (ridge 
regression). Also, experts often want to monitor the importance of the selection rules so that they can make appropriate 
adjustments (e.g. change rules or add new rules). However, the illegal sellers change their behavioral pattern quickly: Some 
rule-based feature that does not help today might helps a lot tomorrow. Therefore it is necessary to build an online feature 
selection framework that evolves dynamically to provide both intuition and optimal performance. In this paper we embed the 
stochastic search variable selection (SSVS) into our online probit regression framework. 

At time interval t, let /? jt be the j-th element of the coefficient vector /3 1. Instead of putting the Gaussian prior on 

P jt, the prior of /? jt now is 

Bjl r-y p 0jt l{j3 jt = 0) + (1 - p 0j t)N(flj t ^jt) 
where pOjt is the prior probability of /? jt being exactly zero. 

D. Multiple Instance Learning 

When we looked into the actual expert reviewing and the labeling process, we noted that the experts actually assign 
labels in a "bagged" fashion, i.e. for each seller identification number, one expert looks through all of his/her posted items, 
and if the expert finds any item as fraud, all of this seller id's posted items are labeled as fraud. In literature the models for 
this scenario is known as "Multiple Instance Learning" . Suppose for each labeled seller i, there are Ki number of cases. For 
these cases, the labels should be same, thus can be denoted as yi. The multiple instance learning model with the logistic 
function becomes 

P [Vi = i] = i _ n 
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which is essentially a noisy-or likelihood function. The noisy-or likelihood function only requires a subset of the 
events in the bag are fraud rather than all are fraud events. The optimization problem can thus be written as: 

mm ^^[-.logtl-n ^ ( ,^ ) 
+(1 - yt)5>g(l +exp(^/3)) + pK t \\0\\k] 

3=1 

+ ^w s \bg{l + exp&p)) + p\\0\\k] 

IV. Conclusion 

Online auction and online shopping have achieved more and more recognition due to the emergence of the world 
wide open and the problem of building online machine -learned models for identifying the auction deception in e-commerce 
web sites is considered. As users are enjoying the advantages from online trading, fraudsters are also taking advantages to 
accomplish deceptive activities against candid parties to obtain dishonest profit. Therefore to detect and prevent such illegal 
and deceptive activities, proactive fraud detection moderation systems are commonly applied in practice. We show that our 
proposed online probit model framework is based on the real word online auction fraud detection data, which combines 
bounding coefficients from proficient knowledge, online feature selection and several instance learning and can extensively 
develop over baselines and the human-tuned model. This online modeling frame can be simply extended to various other 
applications. The adjustment of the selection bias in the online model training process is included to one direction and has 
proven to be very efficient for offline models too. 
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